11 research outputs found

    RDFC-GAN: RGB-Depth Fusion CycleGAN for Indoor Depth Completion

    Full text link
    The raw depth image captured by indoor depth sensors usually has an extensive range of missing depth values due to inherent limitations such as the inability to perceive transparent objects and the limited distance range. The incomplete depth map with missing values burdens many downstream vision tasks, and a rising number of depth completion methods have been proposed to alleviate this issue. While most existing methods can generate accurate dense depth maps from sparse and uniformly sampled depth maps, they are not suitable for complementing large contiguous regions of missing depth values, which is common and critical in images captured in indoor environments. To overcome these challenges, we design a novel two-branch end-to-end fusion network named RDFC-GAN, which takes a pair of RGB and incomplete depth images as input to predict a dense and completed depth map. The first branch employs an encoder-decoder structure, by adhering to the Manhattan world assumption and utilizing normal maps from RGB-D information as guidance, to regress the local dense depth values from the raw depth map. In the other branch, we propose an RGB-depth fusion CycleGAN to transfer the RGB image to the fine-grained textured depth map. We adopt adaptive fusion modules named W-AdaIN to propagate the features across the two branches, and we append a confidence fusion head to fuse the two outputs of the branches for the final depth map. Extensive experiments on NYU-Depth V2 and SUN RGB-D demonstrate that our proposed method clearly improves the depth completion performance, especially in a more realistic setting of indoor environments, with the help of our proposed pseudo depth maps in training.Comment: Haowen Wang and Zhengping Che are with equal contributions. Under review. An earlier version has been accepted by CVPR 2022 (arXiv:2203.10856

    DTF-Net: Category-Level Pose Estimation and Shape Reconstruction via Deformable Template Field

    Full text link
    Estimating 6D poses and reconstructing 3D shapes of objects in open-world scenes from RGB-depth image pairs is challenging. Many existing methods rely on learning geometric features that correspond to specific templates while disregarding shape variations and pose differences among objects in the same category. As a result, these methods underperform when handling unseen object instances in complex environments. In contrast, other approaches aim to achieve category-level estimation and reconstruction by leveraging normalized geometric structure priors, but the static prior-based reconstruction struggles with substantial intra-class variations. To solve these problems, we propose the DTF-Net, a novel framework for pose estimation and shape reconstruction based on implicit neural fields of object categories. In DTF-Net, we design a deformable template field to represent the general category-wise shape latent features and intra-category geometric deformation features. The field establishes continuous shape correspondences, deforming the category template into arbitrary observed instances to accomplish shape reconstruction. We introduce a pose regression module that shares the deformation features and template codes from the fields to estimate the accurate 6D pose of each object in the scene. We integrate a multi-modal representation extraction module to extract object features and semantic masks, enabling end-to-end inference. Moreover, during training, we implement a shape-invariant training strategy and a viewpoint sampling method to further enhance the model's capability to extract object pose features. Extensive experiments on the REAL275 and CAMERA25 datasets demonstrate the superiority of DTF-Net in both synthetic and real scenes. Furthermore, we show that DTF-Net effectively supports grasping tasks with a real robot arm.Comment: The first two authors are with equal contributions. Paper accepted by ACM MM 202

    Bayesian Network-Based Service Context Recognition Model

    No full text

    6G Vision: An AI-Driven Decentralized Network and Service Architecture

    No full text

    Edge-Assisted Distributed DNN Collaborative Computing Approach for Mobile Web Augmented Reality in 5G Networks

    Get PDF
    Web-based DNNs provide accurate object recognition to the mobile Web AR, which is newly emerging as a lightweight mobile AR solution. Webbased DNNs are attracting a great deal of attention. However, balancing the UX against the computing cost for DNN-based object recognition on the Web is difficult for both self-contained and cloud-based offloading approaches, as it is a latency-sensitive service but also has high requirements in terms of computing and networking abilities. Fortunately, the emerging 5G networks promise not only bandwidth and latency improvement but also the pervasive deployment of edge servers which are closer to the users. In this article, we propose the first edge-based collaborative object recognition solution for mobile Web AR in the 5G era. First, we explore the finegrained and adaptive DNN partitioning for the collaboration between the cloud, the edge, and the mobile Web browser. Second, we propose a differentiated DNN computation scheduling approach specially designed for the edge platform. On one hand, performing part of DNN computations on mobile Web without decreasing the UX (i.e., keep response latency below a specific threshold) will effectively reduce the computing cost of the cloud system; on the other hand, performing the remaining DNN computations on the cloud (including remote and edge cloud) will also improve the inference latency and thus UX when compared to the self-contained solution. Obviously, our collaborative solution will balance the interests of both users and service providers. Experiments have been conducted in an actually deployed 5G trial network, and the results show the superiority of our proposed collaborative solution

    Toward holographic video communications:a promising AI-driven solution

    No full text
    Abstract Real-time holographic video communications enable immersive experiences for next-generation video services in the future metaverse era. However, high-fidelity holographic videos require high bandwidth and significant computation resources, which exceed the transferring and computing capacity of 5G networks. This article reviews state-of-the-art holographic point cloud video transmission techniques and highlights the critical challenges of delivering such immersive services. We further implement a preliminary prototype of an AI-driven holographic video communication system and present critical experimental results to evaluate its performance. Finally, we identify future research directions and discuss potential solutions for providing real-time and high-quality holographic experiences

    Nano-Montmorillonite Regulated Crystallization of Hierarchical Strontium Carbonate in a Microbial Mineralization System

    No full text
    In this paper, nano-montmorillonite (nano-MMT) was introduced into the microbial mineralization system of strontium carbonate (SrCO3). By changing the nano-MMT concentration and the mineralization time, the mechanism of mineralization was studied. SrCO3 superstructures with complex forms were acquired in the presence of nano-MMT as a crystal growth regulator. At low concentrations of nano-MMT, a cross-shaped SrCO3 superstructure was obtained. As the concentration increased, flower-like SrCO3 crystals formed via the dissolution and recrystallization processes. An emerging self-assembly process and crystal polymerization mechanism have been proposed by forming complex flower-like SrCO3 superstructures in high concentrations of nano-MMT. The above research indicated that unique bionic synthesis strategies in microbial systems could not only provide a useful route for the production of inorganic or inorganic/organic composites with a novel morphology and unique structure but also provide new ideas for the treatment of radionuclides
    corecore